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1  Introduction 


The  eye-brain  system  achieves  three-dimensional  depth  perception  by  taking 
advantage  of  two  separate  and  distinct  images  captured  by  each  eye.  The 
image  of  an  object  projects  on  the  eye’s  retina.  The  image  of  the  left  eye 
will  differ  from  that  of  the  right.  This  is  called  retinal  disparity  [1].  The 
brain  fuses  the  two  images  and  interprets  the  disparity  into  distances  of  the 
object  from  the  eyes.  These  binocular  cues  produce  perception  of  depth  not 
available  with  monocular  vision.  The  speed  with  which  the  human  brain  can 
integrate  the  continuous  sensor  information  streaming  from  the  optic  nerve 
dwarfs  the  capabilities  of  current  computing  hardware. 

Computational  stereo  describes  the  process  of  synthesizing  the  mechanics 
of  the  binocular  vision.  Stereo  pairs  obtained  from  digital  imaging  are  com¬ 
pared  to  extract  three-dimensional  characteristics  of  a  photographed  scene. 
The  mathematics  involve  only  trignometry;  however,  the  number  of  transfor¬ 
mations  for  high  resolution  images  becomes  staggering.  For  this  technique 
to  receive  more  attention,  the  solution  algorithms  must  execute  quickly  and 
efficiently.  Parallel  computing  offers  the  greatest  hope  of  mimicking  the  feats 
of  the  brain. 

Basically,  a  stereo  camera  pair  is  used  to  take  pictures  of  a  scene  where 
range  information  is  required.  Because  of  binocular  parallax,  these  cameras 
(referred  to  as  the  left  and  right  cameras)  will  acquire  slightly  different  images 
of  the  scene  since  they  are  at  different  locations.  This  effect  can  be  seen  in 
Figure  1.  The  camera  origins  are  aligned  along  the  y  and  z  axes  and  are 
only  displayed  along  the  x  axis.  /£,  and  Ip,  are  the  left  and  right  images  of 
the  stereo  camera  pair.  The  point  P  is  in  the  three-dimensional  scene  and  is 
projected  onto  the  left  and  right  camera  photosensitive  plates  at  Pl  and  Pp, 
respectively.  Fl  and  Fp  represent  the  focal  points  of  the  camera  systems.  The 
disparity  is  the  difference  in  the  locations  of  Pl  and  Pp  on  the  photosensitive 
plates,  which  results  from  the  cameras  being  in  different  horizontal  locations. 
This  disparity  and  the  projection  lines  are  used  to  determine  P’s  position  in 
the  world. 

For  example,  a  point  very  distant  from  the  cameras  will  appear  to  be  at 
the  same  vertical  and  horizontal  position  on  monitors  connected  to  the  left 
and  right  cameras.  However,  a  point  close  to  the  camera  pair  will  not  be  in 
the  same  position  on  the  two  monitors.  Instead,  the  point  on  the  left  monitor 
will  be  displaced  (disparity)  to  the  right  from  the  point  on  the  right  monitor. 
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Figure  1:  Stereo  imaging  camera  system. 

One  can  see  this  phenomenon  in  Figure  2.  Notice  that  points  distant  in 
the  picture  are  at  roughly  the  same  location  in  both  left  and  right  images. 
However,  points  close  to  the  camera,  such  as  the  white  spot  on  the  highway, 
have  a  much  greater  disparity  in  the  tw'o  images. 

Once  the  matching  points  are  found  in  an  image,  an  inverse  perspective 
transform  or  simple  triangulation  can  be  used  to  derive  the  two  lines  where 
the  projection  of  the  world  point  strikes  the  photosensitive  plate  of  the  cam¬ 
era.  When  these  lines  are  intersected,  the  three-dimensional  characteristics 
of  the  scene  can  be  recovered  [2], 

The  ability  to  properly  match  stereo  camera  pair  images  is  important 
to  any  application  in  which  distance  or  range  information  must  be  extracted 
from  an  image.  One  such  application  is  plotting  terrain  contours  from  images 
shot  by  camera  pairs  mounted  on  helicopters.  Once  the  images  are  matched,  a 
contour  plot  is  created  of  the  terrain,  which  shows  elevations  and  depressions 
by  computing  how  far  away  the  ground  points  are  from  the  cameras.  The 
computer  vision  and  robotics  fields  use  this  technique  to  allow  machines,  both 
moving  and  stationary,  to  compute  information  about  their  environments 
[3].  This  technique  should  also  find  favor  with  the  military  since  it  does  not 
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Figure  2:  Disparity  indicates  distance  in  the  scene. 

employ  active  sensing  devices.  Active  devices,  such  as  radar  and  lasers,  are 
easily  detected  and  hinder  stealth  operations. 

The  algorithm  described  in  this  paper  is  stochastic.  It  is  an  undirected 
Monte  Carlo  search  through  the  image  space  which  produces  a  very  fine,  glob¬ 
ally  optimized  disparity  map  where  every  pixel  in  one  image  is  matched  with 
its  corresponding  pixel  in  the  stereo  pair.  Just  as  with  other  Monte  Carlo 
algorithms,  this  approach  requires  a  significant  number  of  floating  point  oper¬ 
ations.  However,  the  process  of  matching  pixels  typically  requires  only  local 
interactions.  On  the  computer,  this  translates  into  local  references  to  mem¬ 
ory.  Furthermore,  the  amount  of  processing  for  each  pixel  remains  uniform. 
These  two  properties,  locality  of  memory  reference  and  uniform  computa¬ 
tional  loading,  make  the  algorithm  appear  ideal  for  parallel  processing.  A 
perfectly  parallel  algorithm  should  exhibit  a  linear  speedup  as  a  function 
of  the  number  of  processors  employed.  This  simple  measure  will  serve  as 
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a  benchmark  for  measuring  the  level  of  parallelization  in  the  application  of 
stereo  imaging. 

The  paper  has  two  goals.  The  first  is  to  present  a  method  for  performing 
stereo  image  matching  and  describe  how  it  was  modified  to  exploit  paral¬ 
lelization.  The  second  goal  is  to  discuss  the  timing  behavior  of  the  algorithm 
in  sequential  versus  parallel  modes  to  include  a  comparison  of  different  com¬ 
puter  architectures.  The  sequential  and  parallel  versions  of  the  algorithm 
along  with  actual  timing  results  are  discussed  in  more  detail  in  later  sec¬ 
tions. 


2  Image  Matching 

Stereo  matching  requires  global  optimization.  Since  the  digital  image  data 
maps  pixel  intensities  to  a  relatively  low  resolution  (typically  eight  bits,  im¬ 
plying  256  discrete  levels),  there  are  many  possible  matches  in  the  local  sense. 
That  is  to  say,  swatches  of  one  image  may  appear  to  map  other  portions  of 
the  stereo  pair.  To  perform  stereoscopic  ranging,  the  whole  image  must  be 
taken  into  account.  Hence,  the  requirement  for  global  optimization. 

The  most  popular  optimization  technique  to  locate  a  global  optimum  is 
called  simulated  annealing  [4].  As  the  name  implies,  the  approach  imitates 
a  natural  process.  Annealing  involves  heating  a  solid  to  the  extent  that 
the  molecules  may  randomly  rearrange  themselves  and  then  cool  gradually. 
Slowly  lowering  the  temperature  allows  the  molecules  to  settle  into  the  lowest 
energy  state,  commonly  described  as  thermal  equilibrium.  If  the  temperature 
rate  declines  too  fast,  defects  may  become  frozen  into  the  end  state.  If  ther¬ 
mal  equilibrium  is  maintained  throughout  the  cooling  cycle,  the  final  system 
should  be  a  globally  optimized  structure.  For  example,  perfect  crystals  are 
grown  in  this  manner. 

Basically,  the  simulated  annealing  algorithm  mimics  the  physical  process 
via  an  undirected  Monte  Carlo  search  through  the  image  space.  This  pro¬ 
cedure,  known  as  the  Metropolis  algorithm,  samples  states  in  a  system  at 
equilibrium.  Since  the  system  is  kept  in  thermal  equilibrium,  its  states  have 
a  Boltzman  distribution 

P{E)  =  exp[^]  (1) 

in  which  E  is  energy,  T  is  the  temperature  of  the  system,  and  P(E)  is  the 
probability  of  a  state  having  energy  E.  Random,  local  state  transitions  are 
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read  Ir,  Ir 

D{row,  col)  =  random  number  in  [0 . .  -Dmax] 

T  —  Tmax 

/*  loop  according  to  fixed  annealing  schedule  */ 
while  T  >  Tmin 

iterate  a  fix  number  of  times 
S'  <=  random  state  change  S 
AE  =  E{S')  -  E(S) 

/*  accept  lower  energy  states  */ 

if  AE  <  0  then  S  =  S' 

else 

P  =  exp  [^] 

j  =  random  number  in  [0 ...  1] 

/*  accept  higher  energy  only  with  Boltzman  probability  */ 

if  j  <  P  then  S  =  S' 
reduce  T  by  predefined  percentage 
end  while 


Figure  3:  Simulated  annealing  algorithm  in  pseudo-code. 

performed  by  varying  the  disparity  only  slightly.  The  change  in  energy  that 
would  result  from  the  new  disparity  is  determined.  If  the  new  state  takes 
the  system  to  a  lower  energy  level,  it  is  accepted.  If  the  new  state  takes 
the  system  to  a  higher  energy  level,  the  new  state  is  accepted  only  with 
probability  P  =  exp  [^^]. 

The  simulated  annealing  technique  is  outlined  in  Figure  3.  The  system 
is  taken  to  equilibrium  by  the  Metropolis  algorithm  by  considering  random, 
local  state  transitions  on  the  basis  of  the  change  in  energy  that  they  imply. 
Since  the  system  is  stochastic,  these  local  state  changes  can  take  the  system 
away  from  convergence  as  well  as  toward  it.  This  helps  to  prevent  the  system 
from  sinking  into  local  minima.  The  processing  is  complete  when  the  system 
is  in  equilibrium  at  the  lowest  energy  state  achievable. 

The  rate  of  temperature  reduction  during  annealing  is  determined  based 
upon  the  changes  in  energy  that  occur.  In  the  original  algorithm,  energy  dis¬ 
tributions  are  constantly  monitored  to  determine  when  equilibrium  is  reached 
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and  temperature  should  be  lowered.  Enough  iterations  of  the  algorithm 
should  be  performed  at  each  temperature  to  bring  the  system  to  thermal 
equilibrium.  Temperature  should  only  be  lowered  once  there  is  no  longer 
any  significant  decrease  in  energy.  In  this  version,  a  fixed  annealing  schedule 
is  used  to  limit  global  accumulators  and  synchronization  that  would  disrupt 
parallelization.  Fixed  annealing  schedules  have  been  shown  to  be  effective  for 
these  problems.  Furthermore,  global  accumulators  are  not  required  since  op¬ 
timizing  local  energies  has  the  same  overall  effect  as  optimizing  global  energy 
without  loss  of  correctness  [5]. 

Several  parameters  must  be  set  when  using  simulated  annealing  with  fixed 
cooling  schedules.  The  first  value  that  must  be  determined  is  the  starting 
temperature.  The  initial  value  of  T  must  be  chosen  in  such  a  way  that 
virtually  all  transitions  are  accepted.  This  means  that  exp  —  1.  for  all 
points  on  the  lattice.  A  general  procedure  for  picking  the  initial  value  of  T 
is  as  follows.  Start  with  a  high  value  and  perform  a  number  of  iterations.  If 
the  acceptance  ratio,  which  is  the  number  of  transitions  accepted  divided  by 
the  number  of  transitions  proposed,  is  less  than  a  certain  cutoff  value,  double 
the  value  of  Tq.  An  acceptance  ratio  of  0.8  is  frequently  used  [4]. 

Determining  the  appropriate  number  of  state  transitions  needed  at  each 
temperature  represents  an  important  consideration  for  accurate  simulations. 
This  value  is  highly  dependent  on  the  size  of  the  problem  space.  For  example, 
stereo  matching  an  image  for  which  the  maximum  disparity  is  20  will  require 
more  iterations  than  stereo  matching  an  image  that  has  a  maximum  disparity 
of  10.  Several  techniques  have  been  proposed  to  help  determine  this  number. 
One  widely  employed  rule  is  that  the  average  number  of  iterations  of  the 
algorithm  for  each  temperature  should  be  roughly  equal  to  that  of  the  number 
of  variables  of  the  problem  being  solved  [4].  More  experimental  results  may 
be  used  by  monitoring  the  number  of  iterations  required  to  bring  the  system 
to  thermal  equilibrium  for  the  given  temperature  [3]. 

Another  concern  involves  decreasing  the  control  variable  T.  Only  small 
decrements  to  T  should  be  allowed  to  make  sure  the  system  can  re-adjust  to 
equilibrium  based  on  the  number  of  iterations  that  are  to  be  employed  for 
each  value  of  T.  A  frequently  used  rule  is 

Tk+i  =  a -Tk,  k  =  0,1,2  (2) 

in  which  a  is  a  constant  smaller  than  but  close  to  1  [4]. 
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The  last  major  concern  is  when  to  terminate  the  algorithm.  The  algo¬ 
rithm  may  be  terminated  when  the  state  space  of  the  problem  no  longer 
fluctuates.  Such  a  point  usually  arrives  when  the  value  of  T  is  sufficiently 
low  and  exp  —  0. 

For  this  implementation,  a  similar  cooling  schedule  as  described  by 
Barnard  was  used  [5].  Temperature  is  initially  100  which  easily  allowed  for 
transition  occurrences  of  greater  than  80%.  Ten  passes  through  the  problem 
space  were  performed  for  each  temperature.  This  value  is  very  close  to  the 
maximum  disparity  of  8  for  each  test  case.  Temperature  was  decreased  by 
10%  after  the  10  loop  passes.  This  uses  the  decrement  rule  with  an  a  value 
of  0.9.  The  algorithm  is  terminated  when  T  drops  below  1. 

Annealing  may  be  simulated  in  any  problem  requiring  a  globally  opti¬ 
mized  solution.  The  process  is  most  useful  in  problems  that  are  very  complex, 
have  high  degrees  of  interaction  between  the  elements,  and  do  not  have  abso¬ 
lute  solution  spaces.  Stereo  matching  fits  all  these  criteria.  Since  every  point 
must  be  matched,  the  process  must  match  anywhere  from  roughly  16,000 
to  260,000  points  (image  sizes  usually  range  from  128  x  128  to  512  x  512). 
Matching  one  point  requires  analyzing  how  its  neighbor  point  matched,  and 
how  that  neighbor’s  neighbor  matched,  and  so  forth.  Furthermore,  there  are 
often  areas  of  occlusion  or  wide  areas  of  homogeneous  intensity  in  the  images 
that  do  not  allow  for  precise  matching.  As  one  can  see,  the  problem  quickly 
becomes  one  of  optimizing  over  a  broad  spectrum  of  possibilities. 

To  use  simulated  annealing,  one  must  model  the  problem  as  an  analog 
to  an  actual  physical  system.  This  will  enable  the  algorithm  to  determine 
when  one  state  is  closer  to  being  optimized  than  another  state.  A  function 
is  constructed  which  analyzes  the  current  state  of  the  process  and  assigns 
a  scalar  value  to  it.  This  function  is  known  as  the  energy  function  for  the 
system.  Simulated  annealing  attempts  to  find  a  global  state  that  has  a 
minimum  energy. 

The  function  used  in  this  case  has  two  sources  of  energy  contribution  [6]. 
For  the  image-matching  problem,  the  associated  energy  at  each  pixel  is 

E{r,  c)  =1  hir,  c)  -  Inir,  c  +  D{r,  c))  [  +\  ]  VZ)(r,  c)  ]  (3) 

in  which  II  and  Ir  represent  the  brightness  at  row  r  and  column  c  of  the 
left  and  right  images,  respectively;  D  is  the  disparity;  VZ)  is  the  sum  of  the 
absolute  differences  between  D  and  its  eight  nearest  neighbors;  and  A  is  a 
weighting  factor. 
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The  first  term  in  the  equation  is  known  as  the  brightness  constraint.  It 
basically  states  that  matching  points,  or  pixels,  on  the  left  and  right  images 
should  be  of  approximately  the  same  intensity  value.  For  example,  if  the  left 
and  right  images  are  digitized  into  eight  bit  gray  scale  brightnesses,  a  pixel 
on  the  left  image  with  a  brightness  value  68  should  be  matched  to  a  pixel 
on  the  right  image  with  approximately  the  same  value.  Identical  cameras 
and  settings  should  be  used  to  limit  variations  in  brightness,  contrast,  and 
so  forth. 

The  second  term  is  sometimes  referred  to  as  the  smoothness  constraint. 
It  asserts  that  the  horizontal  shift  of  an  element  should  be  the  same  as  that  of 
its  neighbors.  This  smoothness  condition  is  necessary  since  the  first  contraint 
is  strictly  local,  and  stereo  correspondences  are  locally  ambiguous.  In  other 
words,  without  this  constraint,  surfaces  would  not  be  spatially  coherent. 

For  example,  suppose  the  left  and  right  images  are  composed  of  solid 
black  backgrounds  with  a  white  square  50  x  50  pixels  roughly  in  the  center 
of  the  image.  The  square  in  the  left  image  has  a  disparity  value  of  5  (all 
points  in  the  square  have  been  shifted  5  places  to  the  right).  If  only  the 
brightness  constraint  were  used,  a  point  directly  inside  the  left  edge  of  the 
square  could  have  a  disparity  value  of  anywhere  from  0  to  ~  50  since  its 
brightness  matches  all  other  pixels  in  the  square.  However,  because  of  the 
smoothness  constraint,  the  left  and  right  edge  disparity  value  of  5  will  be 
propagated  to  the  rest  of  the  interior  of  the  square. 

3  Parallelism 

It  is  the  rule,  rather  than  the  exception,  that  algorithms  have  some  aspect 
that  can  benefit  from  parallelism  and  thus  decrease  overall  computation  time. 
Because  only  local  interactions  are  considered  between  points  on  the  image 
lattices,  the  Metropolis  algorithm  can  benefit  greatly  from  parallelism. 

As  one  searches  for  the  most  efficient  way  to  exploit  parallelism,  a  key 
point  to  keep  in  mind  is  the  structure  of  the  data  in  the  problem.  This 
algorithm’s  fundamental  parallelism  is  one  of  a  result-based  nature.  Ideally, 
one  processor  for  each  point  on  the  disparity  map  would  be  active.  Each 
processor  would  be  responsible  for  computing  its  assigned  pixel’s  disparity 
and  that  disparity  alone.  This  is  a  dynamic,  2-D  data  structure;  trapped 
inside  each  array  element  is  a  process  that  computes  Z)(r,  c)  for  each  pixel. 
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At  the  conclusion  of  the  algorithm,  the  processes  would  vanish  and  leave  a 
resultant  disparity  for  the  (r,  c)  in  question. 

Result-based  computations  perform  very  well  on  finely-grained  architec¬ 
tures.  These  are  computers  with  many  processors  that  are  governed  by  one 
master  processor.  An  example  of  this  type  of  architecture  is  available  on 
the  CM-2  computer,  a  connection  machine  built  by  Thinking  Machines,  Inc., 
having  64,936  simple  one  bit  processors  [7].  Each  processor  in  this  model 
would  run  the  same  algorithm  but  on  differing  data  sets.  The  processors 
would  be  assigned  a  group  of  pixels  on  which  to  work.  All  processors  would 
perform  in  lock-step  governed  by  a  master  process.  Since  the  disparity  map 
is  only  updated  after  all  points  have  been  analyzed,  each  processor  can  work 
independently  of  its  neighbor.  However,  finely  grained  architectures  are  not 
very  common  and  good  speedup  can  still  be  achieved  by  switching  to  an 
agenda-based  parallelism  using  a  master- worker  program  [8]. 

Agenda  parallelism  is  very  suitable  for  the  more  common  coarsely-grained 
architectures.  These  are  architectures  that  typically  use  shared  memory  and 
usually  have  fewer  processors  than  fine  grain  machines.  A  master-worker  type 
of  agenda  parallelism  is  used  in  this  instance.  In  this  computational  stereo 
algorithm,  a  master  process  starts  a  series  of  workers  to  perform  the  image¬ 
matching  task.  Basically,  each  worker  executes  the  code  described  by  the 
text  and  flowchart  in  the  previous  section  of  this  paper.  All  workers  execute 
the  same  code;  the  only  difference  is  which  part  of  the  image  the  workers 
get.  This  is  also  often  referred  to  as  SIMD  (single  instruction,  multiple  data) 
computing.  Each  processor  runs  the  same  program  in  this  model.  Each 
processor,  however,  works  only  on  its  own  data  set. 

It  was  initially  planned  to  split  the  N  xN  image  into  blocks  of  size  mxm 
where  m  =  {N f  numherworkers).  For  example,  if  the  image  were  of  size 
128  X  128  and  we  wanted  to  use  four  workers,  each  worker  would  get  a  block 
of  the  image  of  size  64  x  64.  This  would  have  created  overhead,  since  each 
worker  would  have  had  to  communicate  disparity  information  along  its  edge 
to  its  neighboring  workers.  Instead,  since  there  is  almost  no  vertical  disparity 
in  the  image,  the  workers  each  get  a  block  of  the  image  of  size  mxN.  With  the 
128  X 128  and  four  workers  example,  each  worker  would  get  a  block  of  size  32  x 
128.  Each  worker  can  now  compute  without  synchronization  problems  since 
vertical  disparity  does  not  need  to  be  broadcast  and  received.  Since  the  array 
blocks  are  homogeneous,  a  static  scheduling  method  is  used.  Each  processor 
gets  one  contiguous  block  of  the  overall  image.  No  interleaving  of  data  nor 
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dynamic  scheduling  would  be  warranted  wuth  this  data  set.  Relatively  good 
speedup  should  be  achievable  since  there  is  no  interprocess  communication. 


4  Implementation 

Two  different  coding  environments  and  architectures  were  used  for  running 
the  sequential  and  parallel  versions  of  the  simulated  annealing  algorithm. 
The  same  sequential  code  was  run  on  both  architectures  and  it  was  written 
in  C.  This  code  is  listed  in  Appendix  A. 

4.1  Networked  Sun  IPCs  and  C-Linda 

The  first  environment  was  a  distributed  network  of  Sun  workstations  using 
IPC  processors  operating  at  25  MHz.  The  parallel  algorithm  was  written 
in  C-Linda  [8].  This  code  development  environment  is  a  superset  of  the  C 
language  with  special  functionally  added  to  support  Linda  operations  in  a 
distributed  programming  environment.  The  Linda  environment  allows  ma¬ 
chines  on  local  or  wide  area  netw'orks  to  spawn  processes  on  other  machines 
and  also  allows  these  processes  to  communicate  data.  Appendix  B  lists  the 
code  for  C-Linda. 

Linda  uses  a  memory  model  called  tuple  space.  Since  each  computer  on 
the  network  is  distinct,  they  do  not  share  a  common  memory  area.  Tuple 
space  allows  for  a  virtual  shared  memory  area  between  computers  that  may 
contain  both  active  and  passive  tuples.  The  eval  statement  is  used  to  start 
active  tuples.  These  active  tuples  wull  start  parallel  processes  on  the  comput¬ 
ers  on  the  network.  At  run  time,  the  user  specifies  the  number  of  computers 
that  should  be  employed  to  work  on  the  specific  problem.  A  list  of  avail¬ 
able  computers  with  Linda  installed  is  kept  in  local  files  on  each  computer. 
Linda  will  then  ask  the  computers  listed  in  the  valid  file  to  assist.  If  enough 
computers  are  available,  the  code  will  execute.  When  an  eval  statement  is 
executed,  Linda  will  spawn  the  process  to  a  cooperating  computer.  Linda 
will  attempt  some  load  balancing.  The  operating  system  will  not  allow  one 
computer  in  the  link  to  receive  and  process  all  the  eval  statements. 

Data  tuples  are  created  and  consumed  by  the  statements  out  and  in, 
respectively.  Pattern  matching  is  employed  for  these  operations,  which  uses 
both  formals  and  actuals.  Formals  are  variables  that  become  instantiated  and 
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are  matched  according  to  variable  type.  The  statement  out  (1 ,  "test")  will 
create  a  2-D  tuple  in  tuple  space.  It  consists  of  two  actuals.  The  statement 

in(?  i,  ?  text)  consists  of  two  formals  and  will  consume  the  data  and 

give  i  the  value  1  and  text  the  string  "test".  This  is  only  true  if  i  is  a 
variable  of  type  integer  and  text  is  an  array  of  characters. 

Semaphores  are  available  in  Linda  to  prevent  deadlock  and  race  conditions 
which  may  occur  in  certain  in  and  out  data  space  operations.  If  a  process 

is  waiting  to  consume  tuples  and  none  are  available,  it  will  block. 

The  following  listing  shows  a  small  code  section  of  Linda  tuple  space 
operations. 

main  ■[ 

e  val(  Computed) ) 

out(l,  2)  /*  adds  (1,  2)  tuple  to  tuple  space  */ 

/*  in  may  block  il  no  matching  tuples  are  available  */ 
in(l,  ?result)  /*  consumes  (1,  3)  tuple,  result  =  3  */ 

> 

Compute(i)  {  /*  i  =  1  */ 

in(i,  ?data)  /*  consumes  (1,  2)  tuple,  data  -  2  */ 
out(i,  ++data)  /*  out  (1,  3)  tuple  */ 

> 


4.2  Sun  2000  with  Solaris  Threads 

The  second  architecture  used  was  a  Sun  2000  having  eight  processors  operat¬ 
ing  at  50  MHz  each.  The  operating  system  for  this  computer  was  Sun  OS  5.3 
having  Solaris  threads  which  allows  for  multiple  processes  in  a  shared  mem¬ 
ory  environment.  The  code  for  this  implementation  is  listed  in  Appendix 
C.  Each  thread  can  operate  independently  and  asynchronously.  Basically, 
a  thread  is  a  single  sequence  of  steps  performed  by  one  or  more  programs. 
A  thread  runs  through  a  program,  and  there  is  only  one  point  of  execution 
in  a  thread  at  any  instant  [9].  The  new  threads  are  created  by  a  C  library 
call  and  are  dispatched  to  the  processors  in  the  computer  by  the  operating 
system. 

The  system  call  to  create  new  threads  requires  a  function  to  be  specified 
to  which  the  new  thread  will  branch.  Any  number  of  threads  may  be  created, 
but  it  is  generally  inefficient  to  create  more  threads  than  processors.  Doing 
so  will  cause  one  or  more  processors  to  run  the  remaining  threads  which 
will  correspondingly  result  in  their  performance  degradation.  Parallelizing 
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the  code  required  source  level  modifications  of  the  sequential  code  by  adding 
new  procedure  calls  and  library  loads  during  linking.  Appendix  C  lists  the 
code  for  the  program  using  threads. 

5  Measurements 

The  stereo  matching  algorithm  was  tested  with  several  computer-generated 
random  dot  stereograms.  These  stereograms  represent  synthetic  three- 
dimensional  objects.  The  side  of  the  object  facing  the  camera  system  is 
texture  mapped  with  solid  black.  The  solid  black  is  then  speckled  with  ran¬ 
domly  placed  white  pixels.  Figure  4  shows  the  four-tiered  “wedding  cake” 
structure  used  for  testing  this  algorithm. 


Figure  4:  Three-dimensional  wedding  cake  structure. 

The  number  of  white  dots  is  limited  to  10%  of  the  total  image  to  test 
the  robustness  of  the  algorithm.  The  right  image  is  assigned  the  random 
dot  stereomap.  Since  the  object  is  tiered,  a  stereo  camera  system  above  and 
facing  the  object  perceives  the  dots  to  be  at  different  locations  in  the  right 
and  left  images.  This  effect  is  simulated  by  creating  the  left  image  of  the 
stereo  pair  by  shifting  pixels  in  the  right  image  to  the  right.  Pixels  around 
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the  outer  edge  were  not  offset,  the  next  level  in  was  offset  by  2  pixels,  the 
next  level  4  pixels,  and  the  center  was  offset  by  6  pixels.  Pixels  with  high 
movement  represent  areas  that  would  be  close  to  a  camera  looking  down  from 
above  the  wedding  cake  whereas  pixels  with  no  disparity  would  be  distant 
from  the  camera.  Figure  5  shows  the  random  dot  stereo  pair. 


Figure  5:  Random  dot  stereogram.  The  left  image  is  formed  by  displacing 
points  in  the  right  image. 

Because  of  the  well-defined  disparity  maps,  these  random  dot  images 
represent  ideal  cases  for  evaluating  stereo  matching  algorithms.  That  is,  we 
know  how  the  result  should  look,  whereas  in  a  real-world  image,  there  would 
be  some  doubt  as  to  what  an  exact  map  should  look  like.  There  are  still 
some  areas  of  ambiguity  such  as  sections  devoid  of  white  pixels;  however,  the 
overall  structure  of  the  map  remains  clear.  Image  sizes  of  128  x  128,  256  x  256, 
and  512  x  512  were  used.  These  sizes  were  used  since  most  frame  grabbers 
and  digitizers,  as  well  as  some  image-processing  routines,  historically  use 
image  sizes  of  2”. 

The  primary  goal  of  parallelization  is  to  achieve  speedup.  Several  mea¬ 
surement  techniques  are  used  to  judge  algorithm  parallelization  and  archi¬ 
tectural  efficiency.  The  first  metric  is  speedup.  In  an  ideal  environment,  n 
processors  working  on  a  problem  should  be  able  to  solve  it  n  times  faster. 
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(4) 


This  ideal  is  rarely  achieved.  The  actual  speedup  achieved  is  defined  as 


in  which  T;  is  the  linear  (one  processor)  completion  time  for  the  algorithm 
and  Tp  is  the  parallel  completion  time  using  p  processors.  The  second  metric 
used  is  efficiency.  Efficiency  is  defined  as 


Er>  = 


(5) 


This  number  indicates  the  overall  efficiency  of  the  p  processors  working  on  the 
problem.  Ideally,  this  number  should  be  as  close  to  100%  as  possible  but  will 
suffer  because  of  conditions  of  load  imbalancing,  communication  costs,  and 
various  other  parallelization  overheads.  As  an  example  of  these  two  metrics, 
consider  an  algorithm  A  that  takes  45  time  units  to  run  and  a  parallelized 
version  of  A  with  4  processors  that  takes  16  time  units  to  run.  The  above 
metrics  for  this  situation  equate  to 


^4  =  ^  =  2.81,  E,  =  :^  =  0.70 
16  16-4 

or  a  speedup  of  2.81  and  a  processor  efficiency  rating  of  70%. 

Figure  6  shows  both  the  sequential  and  parallel  results  from  matching  the 
random  dot  stereogram  shown  in  Figure  5.  These  results  are  from  the  Sun 
2000  but  are  representative  of  a  solution  from  any  architecture.  The  parallel 
result  was  generated  using  four  processors.  The  disparity  maps  produced 
by  the  algorithm,  which  are  actually  two-dimensional  arrays  with  integer 
disparity  values,  are  encoded  as  gray  scale  values  for  visual  representation. 
Pixels  with  higher  disparity  values  are  closer  in  stereo  (i.e.,  at  the  top  of  the 
structure)  and  are  displayed  as  brighter  shades  of  gray.  Since  the  algorithm 
is  non-deterministic  and  employs  random  number  generators,  some  small 
differences  can  be  seen  in  the  resultant  images.  The  three  hard  horizontal 
edges  in  the  parallel  result  case  stems  from  the  fact  that  the  edge  falls  directly 
on  a  processor’s  border. 

The  three-dimensional  representation  of  the  disparity  map  is  shown  in 
Figure  7.  Notice  that  the  stereo  matching  algorithm  has  very  closely  recov¬ 
ered  the  wedding  cake  structure  shown  in  Figure  4. 
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Figure  6:  Stereo  matching  results.  (Disparity  maps  are  encoded  as  gray 
scale  values.  The  left  image  was  generated  sequentially,  the  right  image  in 
parallel.) 

6  Analysis  and  Interpretation 

All  times  are  based  on  averages  of  approximately  five  trials  per  case.  Figure 
8  shows  the  sequential  time  that  the  algorithm  took  on  both  the  Sun  IPC 
and  2000  machines. 

As  expected,  because  of  faster  processor  speeds,  the  2000  outperformed 
the  IPC  by  about  a  factor  of  4  in  all  cases.  It  can  also  be  seen  that  the  curve 
in  both  cases  follows  the  same  shape  and  that  the  time  it  takes  to  complete 
the  algorithm  grows  proportionally  with  problem  size. 

The  next  charts  (Figures  9  and  10)  show  the  time  the  algorithm  took  for 
multiprocessors  in  both  hardware  environments. 

Eight  workers  (or  threads)  maximum  were  used  for  the  Sun  2000  since 
eight  processors  were  installed  on  the  test  machine  and  16  workers  were  used 
for  the  IPC  network  since  16  workstations  could  easily  be  acquired.  From 
the  shape  of  the  bars,  one  can  see  that  there  are  no  anomalies  in  the  graph. 
In  all  cases,  as  expected,  the  more  processors  working  on  the  image-matching 
problem  produced  smaller  computation  times. 
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Figure  7:  Three-dimensional  representation  of  a  four-level  random  dot  stere¬ 
ogram. 


image  Size 


Figure  8:  Sequential  time  on  Sun  IPCs  and  Sun  2000. 

Figures  11  and  12  show  the  speedup  achieved  on  the  IPC  network.  Notice 
that  the  speedup  per  the  number  of  processors  is  almost  the  same  regardless 
of  the  image  size  on  which  the  matching  was  performed.  For  example,  the 
speedup  for  two  processors  is  almost  the  same  for  the  128  x  128,  256  x  256, 
and  512  x  512  images.  Image  size  does  not  seem  to  play  a  major  role  in 


defining  speedup.  The  case  at  256  x  256  with  16  processors  is  an  outlier,  but 
this  would  undoubtedly  have  stabilized  with  more  testing. 

Speedup  is  achieved  in  all  cases  but  does  not  quite  reach  its  theoretical 
maximum  in  any  case.  Notice  in  Figure  12  that  the  speedup  moves  farther 
away  from  the  line  representing  ideal  speedup  in  the  case  of  more  processors. 
After  some  testing,  it  was  determined  that  the  operations  to  start  workers  on 
distributed  machines,  the  master  outing  data  to  these  machines,  and  these 
machines  consuming  the  data,  were  very  time-inexpensive  operations.  For 
example,  in  a  typical  case  with  the  maximum  number  of  workers  (16),  it  only 
took  roughly  0.2  second  to  start  all  16  workers.  The  master  outing  data  and 
workers  consuming  it  usually  only  took  about  2  seconds  in  these  cases. 
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Figure  11:  Speedup  achieved  using  C-Linda  on  Sun  IPC  network. 
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Figure  12:  Speedup  using  C-Linda  plotted  against  ideal  linear  speedup. 


However,  time  varied  greatly  in  the  amount  of  time  it  took  the  workers 
to  complete.  In  the  16-worker  example,  the  time  interval  from  when  the  first 
worker  completed  to  when  the  last  worker  completed  was  anywhere  from 
30  to  almost  50  seconds.  In  the  256  x  256  image  size  case,  if  all  workers 
completed  at  the  time  the  first  worker  finished,  this  would  reduce  the  time 
by  close  to  40  seconds  and  increase  speedup  to  almost  15.  This  is  also  why 
the  speedup  is  moving  away  from  the  ideal  speedup  line  on  the  graph.  It 
only  takes  one  slow  worker  to  slow  speedup  and  with  more  workers  on  the 
problem,  a  “weak  link”  is  more  likely.  Speedup  is  very  dependent  on  load 
averages  for  the  machines  working  on  the  problem. 
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The  efficiency  for  the  IPC  network  is  shown  in  Figure  13.  The  data  in 
this  chart  is  not  as  orderly,  and  there  does  not  appear  to  be  any  emerging 
pattern.  The  efficiency  is  in  the  range  from  ~  0.65  to  0.8  for  all  cases. 
Eight  processors  seem  to  be  the  most  efficient  for  all  image  sizes.  The  wide 
variations  in  the  graph  are  attributable  to  the  wide  ranges  of  computational 
loads  on  the  machines  in  the  network.  With  the  256  x  256  image,  16-worker 
example  as  stated  previously,  if  all  processors  finished  as  quickly  as  the  first, 
the  efficiency  would  be  close  to  95%.  This  graph  is  very  dependent  on  the 
“weak”  processor  as  well. 


Figure  13:  Efficiency  of  Sun  IPC  network. 

Speedup  achieved  using  threads  on  the  Sun  2000  computer  is  shown  in 
Figures  14  and  15.  Just  as  with  the  Sun  IPCs,  it  can  be  seen  that  speedup 
grows  relatively  consistently.  In  this  case  as  well,  image  size  does  not  seem 
to  play  a  major  role  in  influencing  speedup.  The  speedup  for  two  processors 
is  roughly  the  same  for  the  128  x  128,  256  x  256,  and  512  x  512  image  size 
cases. 

The  speedup  for  two  and  four  processors  is  good.  The  speedup  for  eight 
processors  is  much  less  than  expected.  The  efficiency  of  the  processors  is  also 
very  good  for  the  two  and  four  processor  (threads)  cases.  This  can  be  seen 
in  Figure  16. 

The  efficiency  drops  quickly  and  uniformly  as  more  processors  are  used 
and  is  quite  poor  for  the  eight  processor  case.  Just  as  with  the  IPCs,  there 
were  large  gaps  in  the  time  interval  it  took  for  threads  to  complete.  For 
example,  in  the  256  x  256  image  case  with  eight  workers,  the  average  time 
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Figure  14:  Speedup  achieved  using  threads  on  Sun  2000. 


Figure  15:  Speedup  using  threads  plotted  against  ideal  linear  speedup. 

interval  between  when  the  first  thread  completed  to  when  the  last  thread 
completed  was  approximately  17  seconds.  Adjusting  for  this  delay,  if  all 
threads  completed  as  early  as  the  first  one  did,  speedup  would  increase  to 
6.5  and  efficiency  would  be  close  to  81%. 

There  are  a  few  possible  explanations  for  this  poor  performance  with 
threads.  The  first  reason  is  machine  load.  The  Sun  2000  used  routinely  had 
load  averages  of  5.0  to  10.0  with  an  average  of  60  to  80  users  when  the  test 
cases  were  computed.  The  speedup  will  decrease  as  more  of  the  machine’s 
resources  are  tapped  so  the  efficiency  will  be  higher  in  the  two-processor  case 
than  the  eight-processor  case. 
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Figure  16:  Efficiency  of  threads  on  Sun  2000. 

Another  issue  that  could  not  be  fully  resolved  was  whether  all  eight 
threads  that  were  created  were  actually  mapped  to  eight  different  processors. 
One  processor  sharing  threads  will  slow  the  speedup  dramatically.  During 
some  of  the  testing,  16  threads  were  created.  In  this  case,  two  threads  should 
have  been  mapped  to  each  processor.  When  the  speedup  was  computed,  it 
was  identical  to  the  eight  thread  case  with  an  efficiency  of  roughly  30%.  If 
the  threads  could  not  be  scheduled  on  separate  processors,  this  would  also 
acdount  for  the  poor  speedup  and  poor  efficiency  with  eight  threads. 

One  other  concern  in  a  shared  memory  environment  like  this  one  is  sub- 
page  thrashing.  If  two  or  more  threads  are  trying  to  write  to  the  same  page  or 
sub  page,  some  operating  system  coordination  and  overhead  are  introduced. 
The  2-D  arrays  used  to  store  the  image  gray  scale  values  varied  in  size  and 
there  was  no  way  to  ensure  that  the  arrays  were  subpage  aligned.  This  seems 
to  be  part  of  the  problem  and  explains  why  there  are  slower  times  with  more 
processors.  Four  processors  trying  for  the  same  subpage  will  generate  more 
overhead  than  two  processors  also  competing  for  the  page. 


7  Conclusions 

Simulated  annealing  offers  an  attractive  method  to  solve  large  combinatorial 
problems.  It  is  conceptually  simple  and  relatively  easy  to  implement.  Once 
a  proper  energy  function  or  heuristic  is  defined  for  the  system,  simulated 
annealing  can  become  an  effective  solution  technique  in  many  problem  areas. 
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Its  one  main  drawback  is  completion  time  since  so  many  iterations  must 
be  performed  to  take  the  system  to  its  lowest  energy  configurations.  This 
time  can  be  substantially  reduced  by  using  parallel  computers  and  taking 
advantage  of  the  parallel  nature  of  the  image-matching  algorithm. 

Several  conclusions  may  be  drawn  regarding  the  parallel  architectures 
that  were  used  for  this  algorithm.  For  limited  numbers  of  processors,  the  Sun 
2000  achieves  greater  speedup  and  is  more  efficient  that  the  IPC  network. 
If  this  algorithm  were  to  use  inter-process  communications,  the  2000  would 
probably  be  much  more  efficient.  For  the  two  and  four  processor  case,  the 
shared  memory  of  the  Sun  2000  outperforms  the  IPC  network  for  all  image 
sizes  with  very  high  efficiency  (higher  than  80%  in  all  cases).  The  IPC 
network  with  C  Linda  start  to  show  better  speedup  factors  and  much  better 
efficiency  than  shared  memory  threads  in  cases  when  more  processors  are 
used  and  would  probably  start  to  outperform  (in  time  required  for  algorithm 
completion)  the  faster  Sun  2000  with  some  extension  to  the  graph.  For 
example,  in  all  image  size  cases,  the  time  16  IPC  processors  took  was  very 
close  to  the  time  the  eight  processor  Sun  2000  took  to  perform  the  stereo 
match.  The  shared  environment  of  threads  requires  more  operating  system 
coordination  to  which  the  IPC  network  is  immune.  This  is  readily  visible  by 
comparing  the  efficiency  bar  graphs. 

The  amount  of  speedup  achievable  in  the  IPC  network  was  good  but  was 
slightly  lower  than  was  expected.  The  deviation  is  understandable,  however, 
since  one  slow  worker  will  cause  movement  away  from  ideal  speedup.  It  is 
usually  the  case  that  at  least  one  non-dedicated  machine  on  a  network  will 
have  a  high  load  average.  The  efficiency  was  good,  however,  across  most 
cases  with  the  majority  of  readings  above  70%. 

The  threads  environment  on  the  Sun  2000  was  more  of  a  disappointment 
and  will  require  more  research  into  which  of  the  factors  mentioned  previously 
are  degrading  performance.  Except  for  the  two  and  four  processor  case,  the 
speedup  achieved  was  poor  with  only  a  factor  of  five  speedup  with  eight 
processors  and  an  efficiency  of  less  than  64%.  More  advanced  techniques 
will  have  to  be  employed  to  determine  how  load  averages  and  page  thrashing 
played  a  role  in  this  disappointing  performance. 

In  terms  of  cost  versus  performance  ratios,  the  network  of  Sun  IPCs 
was  quite  high.  A  network  of  these  low-cost  machines  began  to  perform 
comparable  to  the  more  expensive  Sun  2000  in  cases  of  large  problem  sizes. 
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A  Linear  Stereo  Matching  Algorithm 


/* 

Stereo  matching  algorithm  based  on  simulated  annealing. 

Concept  attributable  to  Stephen  Barnard,  "A  Stochastic  Approach  to 
Stereo  Vision"  Readings  in  Computer  Vision,  Addison-Uesley ,  les  York, 

1987 

Author:  Dale  R.  Shires 
Revision  Date:  6/19/94 
lotes : 

See  the  article  for  more  details  on  the  Monte-Carlo  search  method. 

Limited  error  checking  is  employed. 

A  fixed  annealing  schedule  is  currently  used.  Global  accumulators 
could  also  be  used  on  the  energy  to  help  determine  temperature 
reduction . 

The  procedure  uses  many  defines  and  reads  too  images,  one  named 
Leftimagettt  and  Rightimagettt  share  ttt  is  the  number  of  ross  used. 

This  implementation  only  uses  2*n  image  sizes  and  ross  and  columns  must 
be  the  same. 

The  program  currently  does  not  output  the  actual  disparity  values.  Rather 
it  outputs  a  gray  scale  map  representing  these  values.  The  resultant 
map  name  is  STEREO.RESULTS 

*/ 

•include  <math.h> 

•include  <stdio.h> 

•include  <sys/time.h> 

•define  ROWS  512  /*  Image  height  •/ 

•define  COLS  512  /*  Image  sidth  •/ 

•define  LAHBDA.IHPORTAICE  5  /•  weighting  value  •/ 

•define  D_HAX  8  /*  Maximum  disparity  sas  18  •/ 

•define  D_MII  0  /*  Minimum  disparity  •/ 

•define  LEFT.IMAGE  1  /*  Left  image  identifier  •/ 

•define  RIGHT.IMAGE  2  /*  Right  image  identifier  */ 

•define  STOP.CDL  (COLS  -  D.MAX)  /*  limit  disparity  •/ 

•define  RAIDOM.DISPARITY  (randO  %  D.MAX) 

•define  RAIDOM.PROBABILITY  (randO  %  32767  /  32767.0)  /•  [0..1]  •/ 

•define  STARTIIG.TEMP  100.0  /♦  starting  temp  of  the  system  •/ 

•define  EIDIIG.TEMP  1.0  /*  stop  annealing  at  this  temp  •/ 

•define  TEMP.REDUCTIOI  0.1  /•  reduce  temp  this  much  after  loop  •/ 

•define  LATTICE.SCAIS  10  /*  loop  this  many  times  for  each  temp  •/ 


24 


/♦  Define  types ;  ♦/ 

typedef  char  String255[256] ; 
typedef  unsigned  char  Pixel ; 

/•  Global  arrays  */ 

Pixel  leftlmageCROWS] [CDLS] ; 
Pixel  right Image [ROWS] [COLS] ; 
int  disparities [ROWS] [COLS] ; 
int  tempDisparities [ROWS] [COLS] ; 


/*  Read  the  gray  scale  values  of  a  digital  image  stored  in  ASCII  format  */ 

void  ReadlmagelntensityValuesCthelmage,  shich) 

Pixel  thelmage [ROWS] [COLS] ; 
int  shich; 

{ 

register  int  rosCounter,  colCounter; 
register  Pixel  vptr; 

String255  filelame; 
char  pixelValue ; 

FILE  •theFile;  - 

if  (which  ==  LEFT.IHAGE) 

sprintf (filelame ,  "Leftlmage*/,d" ,  ROWS); 

else 

sprintf  (filelame,  "Right  ImageW' ,  ROWS); 


theFile  =  fopen(filoIame,  "r"); 
if  (theFile  ==  lULL) 

printf ("Could  not  open  file  XsVn",  filelame); 

for  (rosCounter  =  0;  rosCounter  <  ROWS;  rosCounter+t) 

{ 

ptr  =  thelmage [rosCounter] ; 

for  (colCounter  =  0;  colCounter  <  COLS;  colCounter++) 

{ 

fscanf (theFile,  "%c",  tpixelValue) ; 

♦ptr  =  (unsigned  char)pixelValue ; 

++ptr ; 

} 

} 

f close(theFile) ; 

}  /♦  end  ReadlmagelntensityValues  ♦/ 


/♦  Generate  a  nes  state  +-  1  unit  from  the  previous  state,  lo  overflow  or 
underflow  is  allowed  ♦/ 
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int  RandomloBState(oldStata) 
register  int  oldState; 

{ 

register  int  randomlumber ; 

if  (RAIDOH.DISPARITY  >  ((D.NAX  /  2)-l)) 

randomlumber  =  -1; 

else 

randomlumber  •  1 ; 


oldState  =  oldState  +  randomlumber; 


if  (oldState  <  D.MIM) 
oldState  =  D_HII; 


else  if  (oldState  >  D_MAX) 
oldState  =  D.HAX ; 


ret urn (oldState) 


}  /*  end  RandomlesState  */ 


/*  Randomized  the  system.  The  disparity  map  is  initialized  in  the  range 
from  [D.HII  . .  D.HAX]  */ 

void  RandomizeSystemO 

{ 

register  int  row,  col; 

for  (row  =  0;  row  <  ROWS;  row++) 
for  (col  =  0;  col  <  COLS;  col++) 
disparities  [row] [col]  =  RAIDOM.DISPARITY ; 

}  /•  end  RandomizeSystem  */ 

/•  Output  the  resultant  disparity  map  in  a  grayscale  format.  0  indicates 
areas  on  no  disparity  and  255  indicates  areas  of  maximum  disparity. 

The  disparity  range  is  divided  into  steps  in  a  256  brightness  range.  */ 

void  CreateGrayScaleHapO 

{ 

register  int  row,  col; 

FILE  *theFile; 

char  filelame[250] ; 

theFile  «  fopenC'STEREO.RESULTS" ,  "w"); 

for  (row  »  0;  row  <  ROWS;  row++) 
for  (col  «  0;  col  <  COLS;  col++) 

fprintf (theFile ,  "%c",  disparities [row] [col]  *  (255  /  D.HAX)); 
fclose(theFile) ; 
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}  /♦  end  CreateGrayScaleHap  */ 


/*  Compute  the  energy  of  a  pixel  based  on  photometric  and  disparity 
parameters.  ♦/ 

int  Energy (row,  col,  disparity) 
int  row,  col,  disparity; 

{ 

register  int  leftintensity ,  rightintensity ; 
register  int  photo; 
int  del; 

del  =  (abs (disparity  -  disparities  [row-1] [col-1] )  + 
abs(disparity  -  disparities[row-l] [col] )  + 
abs(disparity  -  disparities[row-l] [col+1])  + 
abs(disparity  -  disparities[row] [col-1] )  + 
aba(disparity  -  disparities[row] [col+1] )  + 
abs(disparity  -  disparities[row+l] [col-1])  + 
abs(disparity  -  disparities[row+l] [col] )  + 
abs(disparity  -  di8paritios[row+l] [col+1])) ; 


leftintensity  =  leftImage[row] [col] ; 
rightintensity  =  right  Image [row] [col+disparity] ; 
photo  =  abs (leftintensity  -  rightintensity); 

return(photo  +  (LAHBDA.IHPORTAICE  *  del)); 

}  /*  end  Energy  */ 


/*  The  following  energy  functions  are  similar  to  the  function  Energy  but 
cover  the  special  cases  along  the  boarders  where  8  neighbors  are  not 
present .  */ 

int  TopEnergy(col ,  disparity) 
int  col,  disparity; 

{ 

register  int  leftintensity,  rightintensity; 
register  int  photo; 
int  del; 

del  =  (abs (disparity  -  di8parities[0] [col-1])  + 
abs(disparity  -  disparities^  [col+1]  )  + 
abs(di8parity  -  disparitie8[l] [col-1] )  + 
abs (disparity  -  di8paritie8[l] [col] )  + 
abs(di8parity  -  di8paritie8[l] [col+1])) ; 


leftintensity  =  leftlmage[0] [col] ; 
rightintensity  ”  right Imago [0] [col+disparity] ; 
photo  =  abs (leftintensity  -  rightintensity); 
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return(photo  +  (LAHBDA.IHPORTAICE  *  del)); 
}  /*  end  TopEnergy  */ 


int  LeftEnergyCrow,  disparity) 
int  row,  disparity; 

{ 

register  int  leftintensity ,  rightintensity ; 
register  int  photo; 
int  del; 

del  “  (abs(disparity  -  disparities [roo-l] [0] )  + 
abs(disparity  -  disparitiesCroo-l]  [l] )  + 
absCdisparity  -  disparitiesCros] [1])  + 
absCdisparity  -  disparitias[roB+l] [0] )  + 
ab8(disparity  -  disparities [roB+1] [1] )) ; 


leftintensity  =  left  Image [roB] [0] ; 
rightintensity  =  rightImage[roB] [disparity] ; 
photo  «  absCleftlntensity  -  rightintensity); 

returnCphoto  +  (LAMBDA.IHPORTAICE  *  del)); 

}  /•  end  LeftEnergy  •/ 


int  BottomEnergy(col ,  disparity) 
int  col,  disparity; 

{ 

register  int  leftintensity,  rightintensity; 
register  int  photo; 
int  del; 

del  =  (absCdisparity  -  disparities [ROMS-1] [col-1])  + 
absCdisparity  -  disparities[ROllS-l]  [col+1]  )  + 
absCdisparity  -  disparities[(ROWS-l)-l] [col-1])  + 
absCdisparity  -  disparities[(ROWS-l)-l] [col])  + 
absCdisparity  -  disparities[(ROUS-l)-l] [col+1] )) ; 


leftintensity  =  leftImage[ROWS-l] [col] ; 
rightintensity  «  right  Image [ROWS-1] [col+disparity] ; 
photo  =  absCleftlntensity  -  rightintensity); 

returnCphoto  +  (LAMBDA.IHPORTAICE  *  del)); 

}  /*  end  BottomEnergy  */ 


int  TopLeftEnergy (disparity) 
int  disparity; 
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{ 

register  int  leftintensity ,  rightintensity ; 
register  int  photo; 
int  del; 

del  =  (abs (disparity  -  disparities [0] [1])  + 
absCdisparity  -  disparities[l] [0] )  + 
absCdisparity  -  disparities[l] [1] )) ; 

leftintensity  =  left Image [0] [0] ; 
rightintensity  =  rightlmage[0] [disparity] ; 
photo  =  abs (leftintensity  -  rightintensity); 

return(photo  +  (LAHBDA.IMPORTAICE  *  del)); 

}  /*  end  TopLeftEnergy  */ 


int  BottomLeftEnergy(disparity) 
int  disparity; 

{ 

register  int  leftintensity,  rightintensity; 
register  int  photo; 
int  del; 

del  =  (abs (disparity  -  disparities[(ROWS”l)“l] [0])  + 
abs (disparity  ■  disparities[(ROWS"l)“l] [1] )  + 
abs(disparity  -  disparitiesEROWS-l] [1])) ; 

leftintensity  “  leftImage[ROWS-l] [0] ; 
rightintensity  =  rightImage[ROUS-l] [disparity] ; 
photo  =  abs (leftintensity  -  rightintensity); 

return (photo  +  (LABBDA_IHPORTAICE  ♦  del)); 

}  /*  end  BottomLeftEnergy  ♦/ 


/•  Perform  the  simulated  annealing.  */ 

void  Anneal (temperature) 
double  temperature; 

{ 

register  int  row,  col; 

register  int  deltaEnergy,  newEnergy; 

register  int  etempPtrl,  •tempPtr2; 

int  oldEnergy; 

int  newState; 

/*  compute  top  left  energy  ♦/ 

nesState  =  RandomlewState(disparities[0] [0] ) ; 
oldEnergy  =  TopLeftEnergy(di8parities[0] [0]) ; 
newEnergy  =  TopLeftEnergy(newState) ; 
deltaEnergy  =  newEnergy  -  oldEnergy; 
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if  (deltaEnergy  <=  0) 
tempDisparities [0]  [0]  =  nesState; 

else  if  (RAIDOM.PROBABILITY  <  exp((doubl«)-deltaEnergy/temperature) ) 

tempDisparities [0]  [0]  =  nesState; 

else 

tempDisparities [0] [0]  =  disparities[0] [0] ; 

/*  compute  top  row  energy  •/ 

for  (col  =  1;  col  <  STQP.COL;  col++) 

{ 

newState  =  RandomIewState(disparities[0] [col]) ; 
oldEnergy  =  TopEnergy(col ,  diaparities[0] [col] ) ; 
nesEnergy  =  TopEnergy(col ,  newState); 
deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDisparities [0] [col]  =  newState; 

else  if  (RASDQM_PROBABILITY  <  exp( (double)-deltaEnergy /temperature) ) 

tempDisparities [0] [col]  =  newState; 

else 

tempDisparities [0] [col]  =  disparities [0] [col] ; 

} 

/*  compute  left  column  energy  */ 

for  (row  ■=  1;  row  <  (ROWS-D-l;  row++) 

{ 

newState  =  RandomlewState (disparities [row] [0]) ; 
oldEnergy  =  LeftEnergy (row ,  disparities[row] [0] ) ; 
newEnergy  =  LeftEnergy (row ,  newState); 
deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDisparities [row] [0]  =  newState; 

else  if  (RAIDOH.PROBABILITY  <  exp((double)-deltaEnergy/temperature) ) 

tempDisparities [row] [0]  =  newState; 

else 

tempDisparities [row] [0]  “  disparities[row] [0] ; 

} 

/*  compute  bottom  loft  energy  •/ 

newState  «  RandomlewState (disparities [(ROtfS-l)-l] [0] ) ; 
oldEnergy  =  BottomLeftEnergy (disparities [(ROWS-1 )-l] [0] ) ; 
newEnergy  =  BottomLeftEnergy (newState) ; 
deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tompDisparities[(ROWS-l)-l] [0]  »  newState; 

else  if  (RAIDOM.PROBABILITY  <  oxp((double)-deltaEnergy /temperature)) 

tompDisparities[(ROWS-l)-l] [0]  «  newState; 

else 

tempDisparities[(ROWS-l)-l] [0]  =  disparitio8[(R0WS-l)-l] [0] ; 


/•  compute  bottom  row  energy  ♦/ 
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for  (col  =  1;  col  <  ST0P_CQL;  col++) 

{ 

nesState  -  RemdoRtIeBState(disparities[ROWS-l]  [col])  ; 
oldEnsrgy  =  BottomEnergyCcol ,  diaparitia«[ROWS-l] [col] ) ; 
nasEnergy  =  BottomEnergy(col ,  nesState); 
deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDisparitie8[R0WS-l] [col]  =  nesState; 

else  if  (RAIDOH.PROBABILITY  <  exp( (double)-deltaEnergy /temperature) ) 

tempDisparities [ROUS-1] [col]  =  nesState; 

else 

tempDisparities [ROHS-1] [col]  =  disparities [ROUS-1] [col] ; 

} 

/*  compute  main  grid  energy  ♦/ 

for  (ros  =  1;  ros  <  (ROUS-1 )-l;  ros++) 
for  (col  =  1;  col  <  ST0P_C0L;  col++) 

{ 

nesState  -  RandomIesState(disparitie8[ros] [col]) ; 

oldEnergy  =  Energy (ros,  col,  disparities [ros] [col]) ; 

nesEnergy  =  Energy (ros,  col,  nesState); 

deltaEnergy  =  nesEnergy  -  oldEnergy; 

if  (deltaEnergy  <=  0) 

tempDisparities [ros] [col]  =  nesState; 

else  if  (RAHDOH.PROBABILITY  < 

exp ( (double )-deltaEnergy/temperature)) 

tempDisparities [ros] [col]  =  nesState; 

else 

tempDisparities [ros] [col]  =  disparities [ros] [col] ; 

} 


/*  copy  the  temp  disparity  array  into  the  main  disparity  array  */ 

for  (ros  =  0;  ros  <  ROUS;  ros++) 

< 

tempPtrl  =  disparities [row] ; 
tempPtr2  «  tempDisparities [ros] ; 
for  (col  =  0;  col  <  COLS;  col++) 

•tempPtrl  =  •terapPtr2; 

++tempPtrl ; 

++tempPtr2; 

} 

} 

}  /•  end  Anneal  •/ 


void  main(argc,  argv) 
int  argc; 
char  **argv ; 

{ 


31 


double  currentTemp; 
int  scanCounter; 
int  temp; 

struct  timeval  tpl ,  tp2; 
struct  timezone  tzpl ,  tzp2; 

printf ("Executable  “  %s\r" ,  argv[0]); 
printf("Rous  «  Xd,  Cols  »  Xd\n\n",  ROUS,  COLS); 

/*  road  the  left  and  right  images  */ 

ReadlmagelntensityValuesCleftlmage ,  LEFT.IHAGE) ; 
ReadImageIntensityValuesCrightImage ,  RIGHT.IHAGE) ; 

/*  randomize  the  disparity  map  of  the  system  •/ 

gettimeofday(ttpl ,  ttzpl); 

RandomizeSystemO  ; 

currentTemp  =  STARTIIG.TEMP; 

Bhilo  (currentTemp  >=  EIDUG.TEMP) 

{ 

for  (scanCounter  *  0;  scanCounter  <  LATTICE.SCAIS;  scanCounter++) 

< 

Anneal(currentTemp) ; 
temp  «=  currentTemp; 

} 

currentTemp  -=  (currentTemp  *  TEMP.REDUCTIOI) ; 

} 

gottimeofday (ttp2 ,  ttzp2)  ; 

printf("%ld,  %ld\n" ,  tpl.tv.sec,  tpl .tv_usec) ; 
printf  ("'i(ld ,  %ld\n"  ,  tp2.tv_sec,  tp2 . tv.usec) ; 

CroateGrayScaloMapO  ; 

}  /•  end  main  */ 
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B  C-Linda  Parallel  Stereo  Matching  Algo¬ 
rithm 

/* 

Parallel  stereo  matching  algorithm  based  on  simulated  annealing. 

Concept  attributable  to  Stephen  Barnard,  "A  Stochastic  Approach  to 
Stereo  Vision"  Readins  in  Computer  Vision,  Addison-Besley ,  leu  York, 

1987 

Author;  Dale  R.  Shires 
Revision  Date ;  5/18/94 
■otes : 

This  code  has  been  uritten  for  parallel  processing  oith  C-Linda. 

Appropriate  source  level  annotations  to  the  code  have  been  made. 

See  the  article  for  more  details  on  the  Honte-Carlo  search  method. 

Limited  error  checking  is  employed. 

The  procedure  requires  tuo  files,  one  named  Loftimage  and  one 
named  Rightimage  to  be  present  in  the  directory.  This  implementation 
only  uses  2*n  image  sises  and  the  roes  and  columns  must  be  the  same. 

The •program  currently  does  not  output  the  actual  disparity  values. 

Rather,  it  outputs  a  gray  scale  map  representing  these  values. 

The  resultant  map  name  is  STEREO.RESULTS 

♦/ 


♦include  <8tdio.h> 

♦include  <sys/time.h> 

♦include  <math.h> 

♦define  ROUS  256  /*  the  number  of  rows  in  the  image  */ 

♦define  COLS  256  /•  the  number  of  columns  in  the  image  */ 

♦define  D.HAX  8  /*  maximum  horizontal  disparity  */ 

♦define  D_HII  0  /•  minimum  disparity  */ 

♦define  LAMBDA  5  /•  the  lambda  ueighting  factor  */ 

♦define  WORKERS  16  /*  number  of  workers  to  use  */ 

♦define  LEFT.IMAGE.IAHE  "Left Image" 

♦define  RIGHT.IHAGE.IAHE  "Rightimage" 

♦define  START.TEHP  100.0  /•  starting  temp  of  the  system  */ 

♦define  EID.TEHP  1.0  /*  stop  annealing  at  this  temp  •/ 

♦define  TEHP.REDDCTIOI  0.1  /*  reduce  temp  this  much  after  loop  */ 
♦define  LOOPS  10  /♦  niimber  of  loops  per  temperature  */ 

♦define  RABDOH.DISPARITY  (randO  X  D.HAX) 

♦define  RAIDOM.PROBABILITY  (randO  %  32767  /  32767.0)  /*  [0..1]  ♦/ 
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idefine  STEPSIZE  (ROWS  /  WORKERS)  /♦  number  of  roes  per  Borker  •/ 

typedef  char  String[255]; 

int  RandomleBState(oldState) 
int  oldState; 

{ 

(RAIDOM.DISPARITY  >  ((D.HAX  /  2)  -  1))  ?  —oldState  :  ++oldState; 

(oldState  <  D.HII)  ?  oldState  =  D.MII  :  (oldState  >  D.MAX)  ?  oldState  =  D.HAX  :  1 
return(oldState) ; 

}  /»  end  RandoraleBState  •/ 


void  Readimage (thelmage ,  filelame) 
unsigned  char  thelmage [ROWS] [COLS] ; 

String  filelame ; 

{ 

int  roB,  col; 

FILE  etheFile; 

theFile  =  fopon((char  elfilelame,  "r"); 
if  (theFile  «==  lULL) 

printf ("Error  opening  file  %s\n" ,  filelame); 

for  (row  =  0;  row  <  ROWS;  roB++) 
for  (col  =  0;  col  <  COLS;  col++) 
thelmage [row] [col]  =  getc (theFile) ; 

fcl08e(theFile) ; 

}  /*  end  Readimage  •/ 


void  Croat eO ray ScaleHap (disparities) 
int  disparities [ROWS] [COLS] ; 

{ 

int  roB ,  col; 

FILE  etheFile; 

theFile  «  fopenC'STEREO.RESULTS" ,  "b"); 

for  (row  =  0;  row  <  ROWS;  roB++) 
for  (col  =  0;  col  <  COLS;  col++) 

putc((255  /  D.HAX)  *  disparitios[roB] [col] ,  theFile); 
fclose(theFile) ; 

}  /*  end  CreateGrayScaleHap  */ 
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int  RandomizeHap(theHap) 
int  theMapCSTEPSIZE]  [COLS]  : 

{ 

int  i ,  j  ; 

for  (i  =  0;  i  <  STEPSIZE;  i++) 
for  (j  =  0;  j  <  COLS;  j++) 
theHap[i][j]  =  RAIDOH.DISPARITY ; 

}  /*  end  RandomizeHap  */ 


/*  This  is  the  function  that  each  sorker  gats.  The  function  is  someshat 
long  but  this  sas  done  to  limit  shared  memory  and  numbers  of  calls  to 
make.  There  are  several  special  cases  that  must  be  dealt  sith.  These 
are  areas  share  a  point  does  not  have  8  neighbors,  such  as  the  top  ros, 
side  column,  bottom  ros,  and  topLeft  and  bottomLeft  points  in  the 
block  */ 

int  ComputeDisparities (sorkerlumber) 
int  sorkerlumber ; 

{ 

unsigned  char  leftimage [STEPSIZE] [COLS]  ,  rightimage [STEPSIZE] [COLS]  ; 

int  disparities [STEPSIZE] [COLS] ; 

int  tempDisparities[STEPSIZE] [COLS]  ; 

double  temperature ; 

int  i,  m,  n,  j ,  k; 

int  counter; 

double  currentTemp  =  START.TEHP; 
int  ros,  col,  nesState,  oldState; 
int  oldEnergy,  nesEnergy,  deltaEnergy; 

/*  randomize  my  local  disparity  map  */ 

RandomizeKap(disparities) ; 

/*  read  in  my  chunk  of  the  images  */ 

for  (i  =  0;  i  <  STEPSIZE;  i++) 

in (sorkerlumber ,  i,  ?  lef t Image [i] :n,  ?  rightlmage[i] :m) ; 

outC'done  in  process"); 

/•  start  the  processing  »/ 

shile (currentTemp  >=  EID.TEHP) 

{ 

for  (counter  =  0;  counter  <  LOOPS;  countar++) 

{ 

/*  compute  the  top  left  energy  •/ 

nesState  =  RandomIesState(disparities[0] [0] ) ; 
oldState  «  disparities^  [0]  ; 

oldEnergy  =  ((ab8(disparitie8[0] [1]  -  oldState)  + 
ab8(disparities[l]  [0]  -  oldState)  + 
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abs(disparities[l] [1]  -  oldState))  •  LAHBDA)  + 
absdeftImagaCO]  [0]  -  right  Image  [0]  [oldState]  ) ; 
newEnergy  =  ( (abs (disparities [0] [1]  -  nesState)  + 
abs(disparities[l] [0]  -  nesState)  + 
abs(disparities[l]  [1]  -  newState))  *  LAMBDA)  + 
abs(leftlmage[0] [0]  -  rightimage [0] [neaState] ) ; 

deltaEnergy  =  nesEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDisparities  [0]  [0]  «  neaState; 

else  if  (RAIDQM.PROBABILITY  <  exp( (doublel-deltaEnergy/currentTemp) ) 

tempDisparities [0] [0]  =  neaState; 

else  tempDisparities [0] [0]  •  disparities [0] [0] ; 

/*  compute  the  bottom  loft  energy  */ 

noaState  =  RandomleaState(disparitias[STEPSIZE-l] [0]) ; 
oldState  =  disparities [STEPSIZE-1] [0] ; 

oldEnergy*  ( (abs (disparities [STEPSIZE-1] [1]  -  oldState)  + 
abs (disparit ios [STEPSlZE-2] [0]  -  oldState)  + 
abs(disparities[STEPSIZE-2] [1]  -  oldState))  •  LAMBDA)  + 
abs(loftImago[STEPSIZE-l] [0]  -  right  Image [STEPSIZE-1] [oldState] ) ; 
neaEnergy  *  ((abs(disparities [STEPSIZE-1] [1]  -  neaState)  + 
abs(disparitios[STEPSIZE-2] [0]  -  neaState)  + 
abs(disparities[STEPSIZE-2] [1]  -  neaState))  •  LAMBDA)  + 
abs (loftImago[STEPSIZE-l] [0]  -  rightImage[STEPSIZE-l] [neaState] ) ; 

deltaEnergy  *  neaEnergy  -  oldEnergy; 
if  (deltaEnergy  <*  0) 

tempDisparities [STEPSIZE-1] [0]  *  neaState; 

else  if  (RAIDOM.PROBABILITY  <  exp( (doublo)-deltaEnergy/currontTomp) ) 
tempDisparities [STEPSIZE-1] [0]  «  neaState; 

else  tempDisparities [STEPSIZE-1]  [0]  =  disparities[STEPSIZE-l] [0] ; 

/•  compute  left  side  energy  */ 

for  (roa  -  1;  roa  <  STEPSIZE-2;  roa++) 

{ 

neaState  »  RandomleaState (disparities [roa] [0] ) ; 
oldState  *  disparities [roa] [0] ; 

oldEnergy  =  ((abs(di8paritio8[roa-l] [0]  -  oldState)  + 
abs(di8paritio8[roa-l] [1]  -  oldState)  + 
abs(disparitios [roa] [1]  -  oldState)  + 
abs(disparities [roa+1] [1]  -  oldState)  + 
abs (disparit ies [roa+1] [0]  -  oldState))  •  LAMBDA)  + 
abs(loftImago[roa] [0]  -  rightimage [roa] [oldState]) ; 

neaEnergy  =  ((abs(disparitios[roa-l] [0]  -  noaState)  + 
abs(disparitios[roa-l] [1]  -  neaState)  + 
abs (disparities [roa] [1]  -  neaState)  + 
abs (disparities [roa+1] [1]  -  neaState)  + 
abs(disparitio8[roa+l] [0]  -  neaState))  •  LAMBDA)  + 
ab8(loftImago[roa] [0]  -  right  Image [roa] [noaState]) ; 

deltaEnergy  *  neaEnergy  -  oldEnergy; 
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if  (deltaEnergy  <=  0) 
tempDisparitiesCrow] [0]  =  newState; 

else  if  (RAIDOH.PROBABILITY  <  exp ( (double) -deltaEnergy/currentTemp) ) 

tempDisparities [roe] [0]  =  neoState; 

else  tempDisparitiesCroo] [0]  =  disparities [ron] [0] ; 

} 

/*  compute  top  row  energy  */ 

for  (col  =1;  col  <  COLS  -  D.HAX;  col++) 

{ 

neuState  =  RandomIeBState(di8paritie8[0] [col]) ; 
oldState  =  disparities [0] [col] ; 

oldEnergy  =  ( (abs (disparities [0] [col-1]  -  oldState)  + 
abs (disparities [1]  [col-1]  -  oldState)  + 
abs(disparities[l] [col]  -  oldState)  + 
abs(disparities[l] [col+1]  -  oldState)  + 
abs(disparitie8[0] [col+1]  -  oldState))  ♦  LAMBDA)  + 
abs(leftlmage[0] [col]  -  rightlmago[0] [col+oldState] ) ; 

newEnergy  =  ((ab8(disparitios[0] [col-1]  -  nevState)  + 
abs(di8paritias[l] [col-1]  -  neoState)  + 
ab8(disparitie8[l] [col]  -  neuState)  + 
abs(disparitios[l] [col+1]  -  newState)  + 
abs(diaparities[0] [col+1]  -  newState))  •  LAMBDA)  + 
abs(leftlmage[0] [col]  -  rightlmage[0] [col+newState]) ; 

deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDisparitiesM  [col]  =  newState; 

else  if  (RAIDOM.PROBABILITY  <  exp((double)-deltaEnergy/currentTemp)) 

tempDisparities[0] [col]  *  newState; 

else  tempDisparities[0]  [col]  =  disparities [0]  [col]  ; 

} 

/*  compute  bottom  row  energy  •/ 

for  (col  =1;  col  <  COLS  -  D.HAX;  col++) 

{ 

newState  =  RandomlewState (disparities [STEPSIZE-1] [col] ) ; 
oldState  *  disparities [STEPSIZE-1] [col] ; 

oldEnergy  =  ((abs(di8paritie8[STEPSIZE-l] [col-1]  -  oldState)  + 
abs(disparities[STEPSIZE-2] [col-1]  -  oldState)  + 
abs (disparities [STEPSIZE-2] [col]  -  oldState)  + 
abs(disparities[STEPSIZE-2] [col+1]  -  oldState)  + 
ab8(di8paritie8[STEPSIZE-l] [col+1]  -  oldState))  *  LAMBDA)  + 
abs(leftImage[STEPSIZE-l] [col]  -  rightlmage [STEPSIZE-1] [col+oldState]) ; 

newEnergy  =  ((abs (disparities [STEPSIZE-1] [col-1]  -  newState)  + 
abs (disparities [STEPSIZE-2] [col-1]  -  newState)  + 
ab8(di8paritie8[STEPSIZE-2] [col]  -  newState)  + 
abs (disparities [STEPSIZE-2] [col+1]  -  newState)  + 
ab8(disparitie8[STEPSIZE-l] [col+1]  -  newState))  •  LAMBDA)  + 
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ab8(leftImage[STEPSIZE-l] [col]  -  rightImage[STEPSIZE-l] [col+neBStat«]) 

deltaEnergy  «  neoEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDiaparities [STEPSIZE-1] [col]  «  nesState; 

else  if  (RAIDOM.PROBABILITY  <  exp((double)-deltaEnergy/currentTemp) ) 
tempDisparities[STEPSIZE-l] [col]  «  nesState; 

else  tempDisparities [STEPSIZE-1] [col]  «  disparities [STEPSIZE-1] [col] ; 

} 

/*  compute  the  main  grid  energy  •/ 

for  (row  =  1;  row  <  STEPSIZE-1;  roB++) 
for  (col  =  1;  col  <  COLS  -  D.NAX ;  col++) 

{ 

neBState  =  RandomIeBState(di8parities[roB] [col]) ; 
oldState  =  disparities [roB] [col] ; 

oldEnergy  =  ( (abs (disparities [roB-1] [col-1]  -  oldState)  + 
abs(di8parities[roB-l] [col]  -  oldState)  + 
abs(diaparities[roB-l] [col+1]  -  oldState)  + 
abs(disparities[roB] [col-1]  -  oldState)  + 
ab8(disparitie8[roB] [col+1]  -  oldState)  + 
abs(disparities[roB+l] [col-1]  -  oldState)  + 
ab8(disparities[roB+l] [col]  -  oldState)  + 
abs(disparities[roB+l] [col+1]  -  oldState))  *  LAMBDA)  + 
abs (leftImage[roB] [col]  -  rightImage[roB] [col+oldState]) ; 

neBEnergy  =  ((abs(di8paritie8[roB-l] [col-1]  -  neBState)  + 
abs(disparities [roB-1] [col]  -  neBState)  + 
abs(disparities[roB-l] [eol+1]  -  neBState)  + 
abs(disparities[roB] [col-1]  -  neeState)  + 
abs(disparities[roB] [col+1]  -  neeState)  + 
abs(disparitie8[roB+l] [col-1]  -  neBState)  + 
abs(disparities[roB+l] [col]  -  neBState)  + 
abs (disparities [roB+1] [col+1]  -  neBState))  *  LAMBDA)  + 
abs(leftImage[roB] [col]  -  rightlmage [roe] [col+noBState] ) ; 

deltaEnergy  =  neeEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDisparitie8[roB] [col]  •  neBState; 

else  if  (RAIDOM.PROBABILITY  <  exp((double)-deltaEnergy/currentTemp) ) 

tempDisparities [roB] [col]  =  neBState; 

else  tempDisparities[roB] [col]  =  di8parities[roB] [col] ; 

} 

/*  move  the  temp  disparities  into  the  main  array  •/ 

for  (j  =  0;  j  <  STEPSIZE;  j++) 

for  (k  =  0;  k  <  COLS;  k++) 

disparities [j] [k]  =  tempDi8parities[j] [k] ; 

} 

currentTemp  -=  (currentTemp  *  TEMP.REDUCTIOI) ; 
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} 


/*  output  results  ♦/ 

for  (j  *  0;  j  <  STEPSIZE;  j++) 
outCsorkerlumber,  j,  disparitiesCj] :C0LS) ; 

}  /*  end  ComputeDisparities  */ 


real_main(argc ,  argv) 
int  argc ; 
char  **argv; 

{ 

unsigned  char  left Image [ROWS] [COLS] ,  rightlmage[ROWS] [COLS] ; 

int  disparities [ROWS] [COLS] ; 

int  result [COLS] ; 

int  i,  j,  count; 

int  BorkerlD,  ros,  cols; 

struct  timeval  tpl ,  tp2; 

struct  timezone  tzpl,  tzp2; 

printf ("Executable  =  XsXn" ,  argv[0]); 
printfC'Rous  =  'Li,  Cols  =  y.d\n" ,  ROWS,  COLS); 
printf ("Workers  =  Xd\n\n",  WORKERS); 

/*  set  the  random  seed  to  make  sura  we  gat  different  results  each  time  */ 

gettimeofday (ttpl ,  ttzpl); 

8rand(tpl .tv.sec) ; 

Readlmage(rightlmage ,  RIGHT.IHAGE.IAHE) ; 

Readimagedeftimage ,  LEFT.IHAGE.IAHE)  ; 

start.timerO ; 
timer_split("Start") ; 

/*  start  up  the  workers  ♦/ 

for  (i  =  0;  i  <  WORKERS;  i++) 
eval(ComputeDisparities(i) ) ; 

timer.split ("Workers  started") ; 

/*  out  the  data  to  tuplespace  for  the  workers  to  consume  •/ 
count  =  0; 

for  (i  =  0;  i  <  WORKERS;  i++) 
for  (j  =  0;  j  <  STEPSIZE;  j++) 

{ 

out(i,  j,  leftlmage [count] : COLS,  right Image [count] : COLS) ; 

++count ; 

} 
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timer_split(‘'Data  outed"); 

for  (i=0;  i<  WORKERS;  i++) 
in("done  in  procesa"); 

timar.split ("worker*  inad  data"); 

/•  collect  result*  from  the  worker*  */ 

for  (i  «  0;  i  <  ROWS;  i++) 

< 

in(?  workerlD,  ?  row,  ?  result ; cols)  ; 
for  (j  =  0;  j  <  COLS;  j++) 

di*parities[(workerID  *  STEPSIZE)+row] [j]  «  result[j3; 

} 

timer.split ("Results  Collected")  ; 

/*  create  a  visual  representation  of  the  disparity  map  */ 
CreateGrayScaleMap (disparities) ; 
print.timesO ; 

}  /*  end  raal.main  ♦/ 
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C  Solaris  Threads  Stereo  Matching  Algo¬ 
rithm 

/» 

Parallel  stereo  matching  algorithm  based  on  simulated  annealing. 

Concept  attributable  to  Stephen  Barnard,  "A  Stochastic  Approach  to 
Stereo  Vision"  Readins  in  Computer  Vision,  Addison-Uesley ,  leu  York, 

1987 

Author:  Dale  R.  Shires 
Revision  Date:  5/18/94 
lotes : 

This  code  has  been  oritten  for  parallel  processing  uith  Solaris  threads. 

Appropriate  source  level  annotations  to  the  code  have  been  made. 

See  the  article  for  more  details  on  the  Honte-Carlo  search  method. 

Limited  error  checking  is  employed. 

The  procedure  requires  two  files ,  one  named  Leftiraage  and  one 
named  Rightimage  to  be  present  in  the  directory.  This  implementation 
only  uses  2*n  image  sizes  and  the  rows  and  columns  must  be  the  same. 

The  program  currently  does  not  output  the  actual  disparity  values. 

Rather,  it  outputs  a  gray  scale  map  representing  these  values. 

The  resultant  map  name  is  STEREO.RESULTS 

•/ 


tinclude 

tinclude 

tinclude 

tinclude 

tinclude 

tinclude 


<8tdio .h> 
<sys/time .h> 
<math .h> 
<thread .h> 
<synch.h> 
<errno .h> 


tdefine  ROWS  256  /*  the  number  of  rows  in  the  image  •/ 

tdefine  COLS  256  /♦  the  number  of  columns  in  the  image  */ 
tdefine  D.HAX  6  /*  maximum  horizontal  disparity  */ 
tdefine  D.HII  0  /♦  minimum  horizontal  disparity  »/ 
tdefine  LAMBDA  5  /•  the  lambda  seighting  factor  */ 
tdefine  WORKERS  8  /•  number  of  uorkers  to  use  •/ 
tdefine  LEFT.IHAGE.IAHE  "Leftimage" 
tdefine  RIGHT_IHAGE_IAHE  "Rightimage” 

tdefine  START.TEHP  100.0  /*  starting  temperature  of  the  algorithm  •/ 
tdefine  EHD.TEHP  1.0  /•  ending  temperature  of  the  algorithm  */ 
tdefine  TEHP.REDUCTIOI  0.1  /*  reduce  temp  this  much  after  loop  */ 
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•define  LOOPS  10  /*  number  of  loop*  per  temperature  */ 
•define  RAIDOH.DISPARITY  (randO  %  D.HAX) 

•define  RAIDOM.PROBABILITY  (randO  X  32767  /  32767.0) 
•define  STEPSIZE  (ROWS  /  WORKERS) 

typedef  char  String[255] ; 

unsigned  char  left  Image [ROWS] [COLS] ,  right Image [ROWS] [COLS] ; 
int  di*paritie*[ROWS] [COLS] ,  tempDi*paritie*[ROWS] [COLS] ; 


int  RandomBeeState(oldState) 
int  oldState; 

(RAIDOH.DISPARITY  >  ((D.HAX  /  2)  -  1))  ?  —oldState  :  ++oldState; 
(oldState  <  D.MII)  ?  oldState  «  D.HII  ;  (oldState  >  D.NAX)  ?  oldState  - 
return(oldStata) ; 

}  /*  end  RandomlesState  */ 


void  Writelmage(thelmage,  filelame) 
unsigned  char  thelmage [ROWS] [COLS] ; 

String  filelame ; 

{ 

int  row,  col; 

FILE  etheFile; 

theFile  =  fopen((char  *)filelame,  "w"); 
if  (theFile  ==  lULL) 

printf ("Error  opening  file  %s\n",  filelame); 

for  (row  =  0;  row  <  ROWS;  row++) 
for  (col  =  0;  col  <  COLS;  col++) 
putc(theImage[row] [col] .  theFile) ; 

fclose(theFile)  ; 

}  /•  end  Writelmage  */ 


void  Readimage (thelmage  ,  fileleune) 
unsigned  char  thelmage [ROWS] [COLS] ; 
String  filelame; 

{ 

int  row,  col; 

FILE  etheFile; 

theFile  =  fopen((char  *)filelame,  "r"); 


D.HAX  ;  1 


42 


if  (theFile  ==  lULL) 

printf ("Error  opening  file  %s\n",  filelame); 

for  (roB  =  0;  roB  <  RODS;  roB++) 
for  (col  =  0;  col  <  COLS;  col++) 
thelnageCroB] [col]  =  gotc(theFile) ; 

fclose(theFile) ; 

}  /*  end  Readimage  */ 


void  CreateGrayScaleHap (disparities) 
int  disparit ies [ROWS] [COLS] ; 

{ 

int  roB,  col; 

FILE  etheFile ; 

theFile  =  f open ("STEREO. RESULTS " ,  "b"); 

for  (roB  =  0;  roB  <  ROWS;  roB++) 
for  (col  =  0;  col  <  COLS;  col++) 

putc((255  /  D.MAX)  *  disparities [roB]  [col] ,  theFile); 
fclose(theFile) ; 

}  /•  end  CreateOrayScaleHap  •/ 


int  Randomi2eHap(theHap) 
int  theHap [ROWS] [COLS] ; 

{ 

int  i ,  j  ; 

for  (i  =  0;  i  <  ROWS;  i++) 
for  (j  =  0;  j  <  COLS;  j++) 
theHap[i][j]  =  RAIDOH.DISPARITY; 

}  /*  end  RandomizeHap  */ 


/*  This  is  the  function  that  each  Borker  gets.  The  function  is  someehat 
long  but  this  Bas  done  to  limit  shared  memory  and  numbers  of  calls  to 
make.  There  are  several  special  cases  that  must  be  dealt  Bith.  These 
are  areas  Bhere  a  point  does  not  have  8  neighbors,  such  as  the  top  roB, 
side  column,  bottom  roB,  and  topLeft  and  bottomLeft  points  in  the 
block  */ 

void  ComputeDisparitie8(BorkerIumber) 
int  Borkerlumber ; 

{ 

double  temperature; 
int  i; 
int  n,  m; 
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int  j  ,  k; 
int  counter; 

double  currentTemp  =  START.TEMP; 

int  roB ,  col,  nesState,  oldState; 

int  oldEnergy,  neuEnergy; 

int  deltaEnergy; 

int  startRoB,  stopRoB; 

int  theWorker; 

struct  timeval  tp; 

struct  timezone  tzp; 

/•  set  the  Borker  number  */ 

theWorker  =  Borkerlumber ; 

/♦  determine  start  and  stop  positions  for  this  sorker  */ 

startRoB  =  theWorker  *  STEPSIZE; 
stopRoB  *  StartRoB  +  STEPSIZE; 

gettimeofdaylttp,  ttzp)  ; 

printf  ("Start  %d  y,ld\n" ,  theWorker,  tp.tv.sec); 

/•  start  the  processing  •/ 

Bhile(currantTemp  >=  EID.TEMP) 

{ 

for  (counter  •=  0;  counter  <  LOOPS;  counter++) 

{ 

/*  compute  the  top  loft  energy  •/ 

noBState  •  RandomIoBStato(disparities[8tartRoB] [0] ) ; 
oldState  =  disparitiesCstartRoB] [0] ; 

oldEnergy  =  ((ab8(disparitie8[8tartRoB] [1]  -  oldState)  + 
ab8(disparitie8[8tartRoBtl] [0]  -  oldState)  + 
abs(di8paritio8[8tartRoB+l] [1]  -  oldState))  *  LAHBDA)  + 
abs (left Image [startRoB] [0]  -  rightImage[startRoB] [oldState] ) ; 
noBEnergy  =  ( (abs (disparities [startRoB] [1]  -  noBState)  + 
abs (disparities [startRoB+1] [0]  -  neBState)  + 
abs (disparities [startRoB+l] [1]  -  neBState))  •  LAMBDA)  + 
absdoftImagoCstartRoB]  [0]  -  rightlraageEstartRoB]  [neBState]  ) ; 

deltaEnergy  =  nesEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDisparities[startRoB] [0]  >  neBState; 
else  if  (RAIDOM.PROBABILITY  <  oxpIIdoublel-deltaEnergy/currentTemp) ) 
tempDisparities[startRoB] [0]  •  neBState; 
else  tempDi8paritie8[startRoB] [0]  «  disparities [startRoB] [0] ; 

/*  compute  the  bottom  left  energy  »/ 

neBState  =  RandomIeBState(di8parities[stopRoB-l] [0] ) ; 
oldState  «  disparities [stopRoB-l] [0] ; 

oldEnergy  «  ((abs(disparities[8topRoB-l] [1]  -  oldState)  + 
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abs (disparities [stopRoB-2] [0]  -  oldState)  + 

abs (disparities [stopRoB-2] [1]  -  oldState))  *  LAMBDA)  + 

abs(leftImage[8topRoB-l] [0]  -  rightlmageCstopRoe-l] [oldState] ) ; 

newEnergy  =  ( (abs (disparities [stopRoo-1] [1]  -  neeState)  + 
abs (disparities [8topRow-2] [0]  -  neeState)  + 
abs(disparitie8[8topRow-2] [1]  -  neoState))  *  LAMBDA)  + 
abs ( left Image [at opRoB- 1] [0]  -  rightlmage[stopRoB-l] [nesState] ) ; 

deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDisparitiesCstopRow] [0]  «  nesState; 
else  if  (RAIDOM.PROBABILITY  <  axp((double)-deltaEnargy/currentTemp) ) 
tempDisparitiesCstopRoB-l] [0]  =  nesState; 
else  tempDiaparities[atopRoB-l] [0]  «  disparities [stopRoe-l] [0] ; 

/♦  compute  left  side  energy  */ 

for  (roB  =  1;  roB  <  8topRoB-2;  roB++) 

{ 

neBState  =  RandomJeBState(disparitie8[roB] [0] ) ; 
oldState  =  disparitiesCroe] [0] ; 

oldEnergy  =  ( (abs (disparities [roB-1]  [0]  -  oldState)  + 
abs  (disparities  [roB-1]  [1]  ••  oldState)  + 
abs (disparities [roB] [1]  -  oldState)  + 
abs(di8paritie8[roB+l] [1]  -  oldState)  + 
abs (disparities [roB+l] [0]  -  oldState))  *  LAMBDA)  + 
abs(leftImage[roB]  [0]  -  rightImage[roB]  [oldState]  ) ; 

neBEnergy  =  ((ab8(disparitie8[roB-l] [0]  -  neBState)  + 
abs (disparities [roB-1] [1]  -  neBState)  + 

*  abs (disparities [roB] [1]  -  neBState)  + 

ab8(disparitie8[roB+l] [1]  -  neBState)  + 

abs (disparities [roB+1] [0]  -  neBState))  *  LAMBDA)  + 

ab8(leftImage[roB] [0]  -  rightImage[roB] [neBState] ) ; 

deltaEnergy  =  neBEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDi8parities[roB] [0]  =  neBState; 
else  if  (RAIDOM.PROBABILITY  <  exp((double)-deItaEnergy/currentTemp) ) 
tempDisparities[roB] [0]  =  neBState; 
else  tempDi8parities[roB]  [0]  =  disparitiesUos]  [0]  ; 

} 

/*  compute  top  roB  energy  */ 

for  (col  =1;  col  <  COLS  -  D.HAX;  col++) 

{ 

neBState  =  RandomIeBState(di8paritie8[8tartRoB] [col]) ; 
oldState  *  disparities [startRoe] [col] ; 

oldEnergy  =  ((abs(di8paritie8[8tartRoB] [col-1]  -  oldState)  + 
abs (dispar it ies [startRoB  +  1] [col-1]  -  oldState)  + 
abs (disparities [startRoB  +  1] [col]  -  oldState)  + 
abs (disparities [startRoB  +  1] [col+1]  -  oldState)  + 
ab8(disparities [startRoe] [col+1]  —  oldState))  ♦  LAMBDA)  + 
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absdeftlmageCstartRos]  [col]  -  rightlmagoCstartRoo]  [col+oldState]  ) ; 

newEnorgy  «  ((absCdisparitieaCatartRoB] [col-1]  -  nasState)  + 
abs (disparities [startRow  +  1] [col-1]  -  nenState)  + 
abs(disparities[startRoB  +  1] [col]  -  nesState)  + 
abs(disparities[startRoB  +  1] [col+1]  -  nevState)  + 
abs(disparitie8[8tartRoB] [col+1]  -  nsBState))  »  LAMBDA)  + 
abs(leftIiiiage[startR0B]  [col]  -  right  Image  [startRoB]  [col+neBState]  ) ; 

deltaEnergy  =  neBEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDi8parities[BtartRoB] [col]  =  neBState; 

else  if  (RAIDOH.PRQBABILITY  <  exp((double)-deltaEnergy/currentTemp) ) 
tempDisparities [startRoB] [col]  =  neBState; 

else  tempDisparities [startRoB] [col]  =  disparities[8tartRoB] [col] ; 

} 

/*  compute  bottom  roB  energy  */ 

for  (col  =  1;  col  <  COLS  -  D.MAX ;  col++) 

{ 

neBState  =  RandomSeBState (disparities [stopRoB-1] [col] ) ; 
oldState  *  disparities [stopRoB-1] [col] ; 

oldEnergy  =  ((ab8(disparities[8topRoB-l] [col-1]  -  oldState)  + 
ab8(di8paritie8[stopRoB-2] [col-1]  -  oldState)  + 
ab8(di8parities[8topRoB-2] [col]  -  oldState)  + 
ab8(di8parities[8topRoB-2] [col+1]  -  oldState)  + 
ab8(di8parities[8topRoB-l] [col+1]  -  oldState))  •  LAMBDA)  + 
abs(leftImage[stopRoB-l] [col]  -  right  Image [stopRoB-l] [col+oldState] ) ; 

neBEnergy  =  ((abs(disparitie8[stopRoB-l]  [col-1]  -  neBState)  + 
abs (disparities [stopRoB-2] [col-1]  -  neBState)  + 
abs(di8paritie8[8topRoB-2] [col]  -  neBState)  + 
ab8(disparlties [stopRoB-2] [col+1]  -  neBState)  + 
abs(disparities [stopRoB-1]  [col+1]  -  neBState))  •  LAMBDA)  + 
abs(leftImage[stopRoB-l] [col]  -  rightImage[stopRoB-l] [col+neBState] ) ; 

deltaEnergy  =  neBEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 

tempDisparities [stopRoB-1] [col]  «  nenState; 

else  if  (RAIDOM.PRDBABILITY  <  exp( (double)-deltaEnergy/currontTemp) ) 
tempDisparities [stopRoB-l] [col]  »  nenState; 

else  t8mpDisparitios[stopRoB-l] [col]  =  di8parities[stopRoB-l] [col] ; 

} 

/*  compute  the  main  grid  energy  •/ 

for  (roB  =  startRoB+l;  ron  <  stopRon-l;  roB++) 
for  (col  «  1;  col  <  COLS  -  D.HAX ;  col++) 

{ 

nenState  «  RandomlenState (disparities [ron] [col]) ; 
oldState  «  di8paritie8[roB] [col] ; 

oldEnergy  «  ((abs(disparities[roB-l] [col-l]  -  oldState)  + 
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abs (disparities [roB-1] [col]  -  oldState)  + 
abs(disparitios[roH-l] [col+1]  -  oldState)  + 
abs (disparities [row]  [col-1]  -  oldState)  + 
abs(disparities[roB]  [col+1]  -  oldState)  + 
abs(disparitie8[row+l] [col-1]  -  oldState)  + 
ab8(di8paritie8[roB+l] [col]  -  oldState)  + 
ab8(disparitie8[row+l] [col+1]  -  oldState))  •  LAHBDA)  + 
abs(leftImage[roB] [col]  -  rightlBago[roB] [col+oldState]) ; 

newEnergy  =  ((ab8(disparitio8[ro¥-l] [col-1]  -  nesState)  + 
abs (disparities [row-1] [col]  -  newState)  + 
abs(di8parities[row-l] [col+1]  -  newState)  + 
abs (disparities [row] [col-1]  -  newState)  + 
abs(disparitie8[row] [col+1]  -  newState)  + 
abs(disparities[row+l] [col-1]  -  newState)  + 
abs (disparities [row+1] [col]  -  newState)  + 
abs (disparities [row+1] [col+1]  -  newState))  ♦  LAHBDA)  + 
abs(leftImage[row]  [col]  -  rightIniage[row]  [col+newState] ) ; 

deltaEnergy  =  newEnergy  -  oldEnergy; 
if  (deltaEnergy  <=  0) 
tempDi8paritie8[row] [col]  =  newState; 

else  if  (RAIDOH.PROBABILITY  <  exp((double)-deltaEnergy/currentTemp) ) 

tempDi8parities[row] [col]  «  newState; 

else  tempDisparities[row] [col]  =  disparities[row] [col] ; 

} 

/•  move  the  temp  disparities  into  the  main  array  */ 

for  (j  =  startRow;  j  <  stopRow;  j++) 
for  (k  =  0;  k  <  COLS;  k++) 
disparitie8[j] [k]  *  tempDisparities[j] [k] ; 

} 

currentTemp  -=  (currentTemp  *  TEHP.REDUCTIOI) ; 

} 

gettimeofday(ttp,  ttzp) ; 

printf("Stop  Xd  Xld\n" ,  theWorker,  tp.tv.sec); 

}  /♦  end  ComputeDisparities  */ 


int  main(argc,  argv) 
int  argc ; 
char  **argv; 

{ 

int  i ,  j ,  count ; 

int  workerlD,  row,  cols; 

struct  timeval  tpl ,  tp2,  tp3; 

struct  timezone  tzpl ,  tzp2,  tzp3; 

void  •stat; 

thread.t  thr; 
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printf ("Executable  «  %8\n".  argv[0]): 
printfC'RoBs  =  5Cd ,  Cole  =  X<i\n\n",  ROMS,  COLS); 

RandomizeHap(disparlties) ; 

gettimeofday (ttpl ,  Ittzpl); 
srandCtpl  .tv.sec) ; 

Readimage (right Image,  RIGHT.IHAGE.IAME) ; 

Readlmagedeftlmage,  LEFT.IHAGE.IAME) ; 
gettimeofday (ttpl ,  ttzpl); 

for  (uorkerlD  =  0;  oorkerlD  <  WORKERS;  BorkerID++) 

thr.createdULL ,  0,  ComputeDisparities ,  sorkerlD,  THR_IEW_LWP,  lULL)  ; 
gattimeofday(ttp2 ,  Jttzp2)  ; 

for  (sorkerlD  =  0;  workerlD  <  WORKERS;  BorkerID++) 
thr_join(0,  Jkthr,  kstat); 

gettimeofday (ttp3 ,  ttzp3)  ; 

printf ("Start  Xld ,  XldXn",  tpl. tv.sec,  tpl .tv.usec) ; 

printf ("Threads  created  Xld,  KldXn",  tp2. tv.sec,  tp2 .tv.usec) ; 

printfC'Done  %ld ,  Hld\n",  tp3. tv.sec,  tp3  .tv.usec) ; 

Cre at eGrayScaleMap (disparities) ; 

}  /•  end  real.main  •/ 
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